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ABSTRACT 

The computation and study of triangles in graphs is a stan- 
dard tool in the analysis of real-world networks. Yet most 
of this work focuses on undirected graphs. Real-world net- 
works are often directed and have a significant fraction of 
reciprocal edges. While there is much focus on directed tri- 
adic patterns in the social sciences community, most data 
mining and graph analysis studies ignore direction. 

But how to we make sense of this complex directed struc- 
ture? We propose a collection of directed closure values 
that are analogues of the classic transitivity measure (the 
fraction of wedges that participate in triangles). We per- 
form an extensive set of triadic measurements on a variety 
of massive real-world networks. Our study of these values 
reveal a wealth of information of the nature of direction. 
For instance, we immediately see the importance of recipro- 
cal edges in forming triangles and can measure the power of 
transitivity. Surprisingly, the chance that a wedge is closed 
depends heavily on its directed structure. We also observe 
striking similarities between the triadic closure patterns of 
different web and social networks. 

Together with these observations, we also present the first 
sampling based algorithm for fast estimation of directed tri- 
angles. Previous estimation methods were targeted towards 
undirected triangles and could not be extended to directed 
graphs. Our method, based on wedge sampling, gives orders 
of magnitude speedup over state of the art enumeration. 



^Sandia National Laboratories is a multi-program labora- 
tory managed and operated by Sandia Corporation, a wholly 
owned subsidiary of Lockheed Martin Corporation, for the 
U.S. Department of Energy's National Nuclear Security Ad- 
ministration under contract DE-AC04-94AL85000. 
*This work was funded by the GRAPHS Program at 
DARPA and the Laboratory Directed Research and Devel- 
opment program at Sandia National Laboratories. 
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1. INTRODUCTION 

The study of triangles is by now a classic tool in the anal- 
ysis of large-scale networks. The focus on triangles has its 
roots in a variety of disciplines: in social sciences as a man- 
ifestation of various theories, in physics as local measures of 
clustering, in biology as motifs. Yet most contemporary data 
mining and massive graph analysis first convert real-world 
interaction data (think of this as a graph with attributes) 
into an undirected graph, and then work on this graph. This 
is a very fruitful method, since the complexity of the un- 
derlying problem is reduced, and we still get a significant 
amount of information. Nonetheless, it is a major challenge 
to account for the attributes on edges. 

The most common attribute for edges is direction. For 
example, most social networks, web networks, and product 
networks are all truly directed networks. Moreover, directed 
networks often have a significant percentage of reciprocal 
edges. Newman et al. [18] shows that the fraction of such 
edges in commonly studied graphs is quite high, and sub- 
sequent studies underlined the importance of such edges in 
virus/news spreading and understanding the network forma- 
tion [10, 17, 14]. 

The set of triangles (and wedges) involving directed and 
reciprocal edges is rich and holds information about the un- 
derlying dynamics [13, 16, 9, 8, 23]. But it is challenging to 
make sense of this information and also compare different 
graphs (from varied sources) along these metrics. Further- 
more, computation of triangles becomes quite expensive for 
large graphs. 

1.1 Some preliminaries 

We focus on a directed graph (digraph) G — {V, E). In a 
standard digraph, all edges are just ordered pairs of vertices 
of the form (i,j). We will think of the graph as having two 
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different types of edges: basic and reciprocal. A reciprocal 
edge is technically a pair {{i,j), which we merge into 

a single reciprocal edge. We do not think of a reciprocal 
edge as containing two directed edges, but consider it to be a 
special edge on tts own. In our figures, reciprocal edges are 
depicted as double-headed arrows. We define reciprocity of 
a graph, r, as the ratio of the number of reciprocal edges to 
the total number of edges. Note that our definitions lightly 
different than that of [18]. 

A wedge is a pair of edges that share an endpoint, and 
a triangle is a set of three (unparallel) edges that are inci- 
dent on a set of three vertices. We have 6 different types of 
wedges and 7 different types of triangles. We give more de- 
tails about these structures in §2. We give the list of directed 
wedges and triangles with reciprocity in Figure 1. The earli- 
est construction of this list is by IfoUand and Leinhardt [13] . 

1.2 IMaiii results of this paper 

• Definition of directed closures: We generalize the 
classic notion of transttivtty (pg. 243 of [25]), which is also 
called the global clustering coefficient, to digraphs. This 
leads to a set of 15 closure values that provide a triadic 
pattern of a digraph. 

• Computation of directed triangles and closures: 
We extend the basic method of wedge sampling [19, 20] to 
approximating the counts of directed triangles. This gives 
fast scalable algorithms with provable error bounds. While 
triangle counting is a well-studied problem, we present the 
first algorithm that works for digraphs. This algorithm en- 
ables efficient computation of all closure values. 

We perform experiments on a set of publicly available datasets. 
We present the directed closure information in a succinct 
form that allows comparison of different graphs. This leads 
to a series of observations. 



• Heterogeneity of closure: We find the closure frac- 
tions of wedges vary greatly depending on the wedge type. 
In- wedges ((iii) in Figure la) usually dominate the graph 
but are rarely closed. In many cases, all other wedge types 
close frequently. 

• Reciprocity induces closure: For almost every graph 
we analyze, the presence of a reciprocal edge in a wedge 
greatly increases the chance of closure. In other words, 
wedges with reciprocal edges participate in triangles more 
frequently than (uniform) random wedges. 

• The power of transitivity: Loops and path-recip tri- 
angles ((b) and (d) in Figure lb) are very infrequent. These 
triangles contain a transitive wedge that is not reciprocated, 
and the fact that they are so rare suggest the power of transi- 
tivity in the underlying dynamics. This appears to validate 
the importance of transitivity, as posited by Holland and 
Leinhardt [13] in the social science community (Recent re- 
sults of Leskovec et al. [15] in signed networks make compa- 
rable observations). These observations also underscore the 
importance of reciprocity, since this distinguishes triangles 
without transitivity from those that have it. 

• The non-randomness of direction: We define a sim- 
ple random model of direction in an underlying undirected 
graph and compute directed closure values for this model. 
The predictions from this are significantly different from the 
actual data, showing that our findings indicate a deep di- 
rected structure in real-world networks. 

What is the significance of these results? First, we feel that 
these observations show the importance of direction and reci- 
procity, which we believe is not emphasized enough in anal- 
yses of social networks. Designing meaningful measures re- 
lated to directed triangles and interpretable presentations is 
an important step in understanding digraphs. We also need 
efficient algorithms to compute such measures. We hope 
that this work is a step in this process. The wealth of infor- 
mation that is obtained by looking at directed closure values 
(at least in the authors' opinion) shows the importance of 
the directed closure values. 

These values also inform graph modeling because they pro- 
vide formal measures that models can be tested against. It 
has been observed before that existing graph models have 
little to no reciprocity [7], so no model can even come close 
to matching directed closures. We have no models that even 
come close to recreating structure of digraphs. This is proba- 
bly a very difficult problem, but greater insight into directed 
closures might help in making progress. 

An attribute of networks that we ignore is sign (a positive 
versus negative relationship). Many social science theories 
focus on sign in networks, and recent work by Leskovec et 
al. [15] studies signed networks. It would be interesting to 
extend our work to signed and directed networks. 

1.3 Previous work 

The earliest study of directed triads with reciprocity, to 
our knowledge, is in the social sciences, by Holland and Lein- 
hardt [13]. They explicitly list the 16 different possible triads 
(including the 3 patterns with at most one edge) and count 
them in various social networks of the time. They also try 
to measure the effects of reciprocity in network formation. 
This is called the triad census. Skvoretz [21] and Skvoretz 
et al. [22] use these numbers of predict various biases in net- 
work formation. In a more recent study, Faust [9] computes 
a triad census on many graphs to compare their structure. 
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Most of this work has been restricted to small data sets 
(at most hundreds of nodes). Finding such triads has been 
referred to as motif finding in the bioinformatics commu- 
nity [16] . Simpler versions of triad census counts have also 
been used to analyze gaming data [23]. 

A classic local measure of triangle density is clustering co- 
efficient, introduced by Watts and Strogatz [26]. Fagiolo [8] 
proposes a local clustering coefficient measure for directed 
networks, though he ignores reciprocity. Ahnert and Fink [2] 
construct "clustering coefficients signatures" from these mea- 
sures and classify directed networks. 

Leskovec et al [15] study signed networks and validate 
(and extend) the theory of balance [11, 3]. They study the 
behavior of signed triangles to show that theory of balance 
does not suffice to explain networks. They also look at di- 
rection, but their datasets do not involve much reciprocity. 
It would be interesting to combine their work with our mea- 
sures of directed closures. 

2. THE DIRECTED CLOSURES 

We begin with some notation and introduction to the di- 
rected structures in Figure la and Figure lb. We use small 
Roman numerals to index the types of wedges, and small 
Latin letters for triangles. Furthermore, is used to denote 
a variable wedge type, and r for a variable triangle type. 
We also give some names for further reference. (Holland 
and Leinhardt [13] have a naming scheme for directed triads 
that involve a triple of numbers with a letter. We deviate 
from this notation because its easier to remember names 
than 3 digit codes.) 

We stress that these types form a partition of all wedges 
and triangles. Since reciprocal edges are special, we do not 
think of (say) the recip-out wedge containing an out wedge. 

For each vertex v, we have three associated degrees: the 
indegree, outdegree, and reciprocal degree. These are de- 
noted by d^ , d^ , and d^ . The total degree d„ = d^ + 
+ . We mention some of the salient features of these 
directed structures. 

• Basic vs reciprocal structures: The structures with- 
out reciprocal edges form the first rows in both Figure la 
and Figure lb. There are only 3 types of wedges and 2 types 
of triangles, underscoring the importance of reciprocity. 

• Cyclic relations: Triangle types (b), (d), (f), (g) all 
contain a cycle, and there is a progression of 0, 1, 2, and 3 
reciprocal edges. 

• The table of x('/'i''") values: Different triangle types 
naturally contain different types of wedges. This informa- 
tion is summarized by the function x(V') ■'')i which we define 
as the number of type ^ wedges in type r triangles. The 
list of nonzero values of xii'y ''') is provided in Tabic 1. Each 
row contains the wedge information of that triangle type. 
There are 15 nonzero entries in this table. 

• Wedge counts: For vertex v, let Wv,^ be the set of tp- 
wedges centered at v. It is routine to compute given 
the degrees of v. This is summarized in Table 2. 

2.1 (i/),r) -closure 

The transitivity (or global clustering coefficient) is defined 
as 3|T|/|ty| (T is the set of triangles and W is the set of 
wedges). Semantically, this is the fraction of wedges that 
participate in triangles. 

In the undirected setting, a wedge is called closed if it 
participates in a triangle and open otherwise. We say that 
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Table 1: Number of occurrences of each wedge type per 
triangle type: x(^i''")- 
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Table 2: Number of wedges per vertex for each wedge type. 



a -i/i-wedge is r-closed if the wedge participates in a type 
T triangle. The {iI},t) -closure, K^^r, is the fraction of tp- 
wedges that are r-closed. Formally, let be the set of 
wedges and T-r be the set of r-triangles. 



x{i',r)\T^\ 



The number of tp-wedge in r-triangles is x(V'i ''')|Tt|. Note 
that if type r triangles contain no type ^ wedge, then this 
quantity is just zero because of x(^/',''"). As mentioned ear- 
lier, there are 15 non-trivial {tp, T)-closures. 

2.2 Representations 

We create a directed closure chart, which combines all the 
K^,T values. We give an example for the web-Google [27] 
graph in Figure 2. The bars on the a;-axis are indexed by 
the different wedge types, and the j/-axis is We make 

a stacked bar chart with the different closure values, where 
the triangle types are shown in 7 different colors. For exam- 
ple, the blue part of the first bar is the fraction of out-wedge 
closing into trans triangles (Ki,a). Some of the salient fea- 
tures: 

1. Single closure value: Consider some wedge type and 
triangle type (say out-wedge and trans-triangle). The value 
Ki,a is shown by the height of the blue part of the first bar. 
The height of the blue part in the second bar show the frac- 
tion of path-wedges that are closed into a trans-triangle. 

2. Total closure of wedge type: The total height of 
the bar is total fraction of closed wedges of that type. For 
example, we see that see that in-wedges close infrequently. 

3. Percentage of wedge type: Underneath the wedge 
pictures is the percentage of that wedge type. 

4. Percentage of triangle type: Underneath the leg- 
end for triangles is the percentage of that triangle type. 

5. Undirected transitivity: The value of k is marked 
by a thick dashed line. 
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Figure 2: Directed closure for web-Google Figure 5: Directed closure for soc-Epinionsl 




Figure 3: Directed closure for web-Stanford 
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Figure 6: Directed closure for livejournal 
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Figure 4: Directed closure for web-BerkStan 




Figure 7: Directed closure for soc-Slaslidot0902 
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Table 3: Properties of the graphs 



Graph Name 
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3. OBSERVATIONS ON CLOSURE CHARTS 

We analyze the directed closure properties of various real 
graphs, whose properties are presented in Table 3. In this 
table, \V\, \E\, \W\, and \T\ correspond to the number of ver- 
tices, edges, wedges, and triangles, respectively. The reci- 
procity r is the fraction of total edges that are reciproal 
edges. The undirected transitivity (3|T|/|VK|) is given by a. 

3.1 Similarities of directed closures 

Figure 2, Figure 3, and Figure 4 have the closure charts 
for three different web graphs: web-Google, web-Stanford, 
and web-BerkStan [27]. These graphs have vertices for web- 
pages and directed edges for web links. Figure 5, Figure 6, 
and Figure 7 have the charts for three social networks [27]. 
The vertices of soc-Epinions are member of Epinions, a con- 
sumer review site. A directed edge between users shows a 
trust relationship originating from one user (these are signed 
by trust /distrust, which we ignore). The vertices of soc- 
Slashdot [27] are users and edges represent tagging as friend 
or foe. The vertices of soc-livejournal [5, 1] are Slashdot 
users with edges denoting friendship (which is one-way). 

Observe the uncanny similarity of the closure charts web 
graphs, despite them being from different sources (and dif- 
ferent sizes). The color patterns are remarkably similar, 
showing similar distributions of different closures. The so- 
cial networks show more variation, but the overall structure 
of the charts is not far from the web graphs. In general, we 
note that in-wedges rarely close and reciprocal wedges close 
much more frequently. 

3.2 Heterogeneity of closure 

The heterogeneity of wedge closure is quite clear from all 
the closure charts. Focus on the web graphs. Other than 
in-wedges, all other wedge types close (quite) frequently. 
The undirected transitivity is always below 0.05, but specific 
wedge types close more than 50% of the time (shown by the 
total height of the bar). In-wedges form a dominant ma- 
jority of all wedges (more than 98%) but close infrequently. 
Indeed, the low value of transitivity is explained by the high 
percentage yet low closure of in-wedges. 

The picture is not as dramatic in the social networks, but 
there is some variation in closures over the wedge types. 
Quite consistently, in-wedges do not close and recip-tot- 
wedges close more frequently. 

3.3 Effect of reciprocity on closure rates 

How does reciprocity change the chance of closure? Ob- 
serve that in, path, and out-wedges contain no reciprocal 
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Figure 8: How reciprocity increases closure rates: the x-axis 
goes over various graphs. The colored bars correspond to 
wedges with 0, 1, or 2 reciprocal edges. The y-axis gives the 
fraction of those wedges that close (into any triangle). 



edges, recip-in and recip-out- wedges have exactly 1 recipro- 
cal edge, and recip-tot has 2 reciprocal edges. As the charts 
clearly indicate, having reciprocal edges increases the chance 
of closure a wedge. We do a comprehensive calculation on a 
variety of graphs in Figure 8. 

Consider a graph and choose k from {0, 1, 2}. Fix the set 
of wedges with k reciprocal edges, and look at the fraction 
of those that close (into any triangle). This gives the data 
presented in Figure 8. Observe how there is consistently a 
monotonic (and often dramatic) increase in closure fractions 
as reciprocity increases. The average of chance of closure for 
a wedge without reciprocal edges is only 3%. But this num- 
ber goes to 23% if one of the edges is reciprocal and further 
increases to 38% when both edges are reciprocal. This find- 
ing is consistent with the earlier reports about reciprocal 
edges, indicating stronger ties between two vertices [18, 10, 
17, 14]. It also underscores how important it is to consider 
direction in networks. 

3.4 The power of transitivity 

Throughout the closure charts, one notices in infrequency 
of loop and path-recip-triangles. These are colored light 
blue and yellow, and one can see how little of those colors 
are present (or one can directly look at their percentages). 
Let us focus on triangles that contain a cycle showing a 
"cyclic" relationship. These are exactly loop, path-recip, 2- 
recip, and 3-recip-triangles. (These are given in light blue, 
yellow, brown, and pink, respectively.) Now consider tran- 
sitive relations that are not reciprocated. For example, A 
connects to B who connects to C, but A does not connect 
to C. When a triangle contains a cycle, a reciprocated tran- 
sitive relationship creates a reciprocal edges. 

Since loop-triangles have no reciprocal edges, there are 
three transitive relations that are not reciprocated. Analo- 
gously, for path-recip-triangle, there are two such unrecipro- 
cated relations. And for 2-recip and 3-recip triangles, these 
numbers are one and zero. 

So we ask, when a triangle contains a cycle, does it con- 
tain unreciprocated transitive relations? One would think 
that a cycle indicates a strong tie between three vertices. 
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Figure 9: Power of transitivity: For each graph in our col- 
lection, we plot the percentages of (different) triangle types 
containing a cycle. Each bar corresponds to a single graph, 
and the stacked bar charts gives the percentages of the 4 
different triangle types. Note the dominance of pink and 
brown (3-recip and 2-recip-triangles) . 
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Table 5: Fractions of triangle types based on the null model. 




Figure 10: Closure chart of web-Google for random direc- 
tions: We consider the undirected version of web-Google 
graph and added one-way and reciprocal edges according the 
random null model. Observe that the total closure for each 
wedge is identical, and how different this is from Figure 2. 



and so reciprocation is expected. This is exactly what we 
see in Figure 9, quite strongly over practically all graphs. 
Almost all triangles with a cycle are either 2-recip or 3-recip- 
triangles. We almost never see any loop-triangles, shown by 
the lack of light blue in Figure 9. Again, this is more evi- 
dence that reciprocal edges play an important role in graph 
structure. The results demonstrate that the power of tran- 
sitivity of real world networks. One can observe that social 
relationships carried forward two steps (as a transitive rela- 
tion) almost always lead to reciprocation. 

4. NULL MODELS FOR (^, r)-CLOSURE 

In the previous section, we made several observations about 
the (i/), r)-closure rates in real graphs. How significant are 
these results? Can they be explained merely by the reci- 
procity of a graph? We propose a null hypothesis, based on 
assigning the type of each edge only based on the reciprocity 
of the graph. We start by making the graph undirected and 
and insert direction and reciprocity randomly as follows. If 
{u, v) is an undirected edge, we make it reciprocal with prob- 
ability r; we direct it from u to v with probability (1 — r)/2, 
we direct from w to u with probability (1 — r)/2. Based on 
this model, the probabilities of an undirected wedge and/or 
triangle being of a certain type can be calculated through 
simple calculations. This information is presented in Table 4 
and Table 5. 



Table 5 reveals that observations of the previous section 
cannot be explained by randomness or reciprocity. For in- 
stance, if we compare the expected fractions of the last two 
triangles 2-recip and 3-recip, we see that 2-recip should be 
more frequent when the reciprocity, r < 0.75. Even though 
this condition holds in most of the graphs in our data set, we 
observe the contrary behavior in real data sets, and 3-recip 
generally is more frequent than 2-recip. Another observa- 
tion is about loop triangles. According to our null model, 
trans and loop triangles have the same dependence of reci- 
procity, and trans triangles are expected to be only 3 times 
more frequent than loop triangles. However, transtriangles 
are much more frequent in practice. In other words, the 
null model can explain the sparsity of looptriangles, but not 
their near absence. 

Figure 10 illustrates how the directed closure chart would 
look when direction is random. We take the undirected ver- 
sion of web- Google graph and add one-way and reciprocal 
edges according the random null model. If we compare this 
figure to Figure 2, we see a totally different distribution, 
pointing to the significance of our results. Here, we are only 
presenting the results for web-Google due to space limita- 
tions, but we observed the same trend in all other graphs. 

Finally, in Figure 11 we look at two triangle types, out- 
recip and path-recip, whose dependences on reciprocity are 
the same, but one is overrepresented, while the other is un- 
der represented, compared to the expectation of the null 
hypothesis. Type out-recip-triangles are overrepresented in 
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Figure 11: Deviations from the null model: For various graphs, we plot the fraction of triangles of a given type together 
with what is predicted by the random null model. This is done for the out-recip and path-recip-triangles. Observe the large 
differences, showing that directed triangle distributions are far from random. 



all graphs except web-NotreDame and youtube-links, while 
path-recip-triangles are underrepresented in all graphs. 

All these results show that the direction in triangles re- 
veals a special structure, which cannot be explained by ran- 
domness or reciprocilty. 

5. COUNTING DIRECTED TRIANGLES 

The results in the previous section showed the impor- 
tance of computing directed closure charts. In this section, 
we turn our attention how to perform this task efficiently 
and describe approximation algorithms to estimate the var- 
ious clustering coefficients (and also the numbers of trian- 
gles). We extend the method of wedge sampling for directed 
graphs. 

We begin with some basic notation. We define the follow- 
ing seven subsets of . Let 

W^i/'(''") = { G W^v I is T-closed } . 

Note that k^,^ = This fraction can now be 

estimated through the following algorithmic template. 

1. Select k uniform random i/;- wedges (with replacement). 

2. Determine fc', the number of r-closed wedges among 
this sample. 

3. Output estimate k^,t = k' /k for k^^t. 

The main theorem shows that this provides a good estimate 
for K^,r- Similar versions of this theorem have appeared in 
our earlier work [19, 20], but we provide a proof for com- 
pleteness. We first state Ifoeffding's inequality. 

Theorem 5.1 (Hoeffding [12]). Let Xi,X2, ... ,Xk be 
independent random variables with < < 1 for all i — 
1, . . . ,k. Define X = j; Then for any e > 0, we 

have 

Pr[\X - E[X]\ >t]< 2exp(-2tVfc)- 



Theorem 5.2. Let e,S > and set k ^ \0.5e''^ln{2/S)']. 
PT[\fii,,T — k^,t| > e] < S 

Proof. Define indicator random variable Xi for the ith 
i/)-wedge being r-closed (so Xi = 1 ii the wedge is r-closed 
and otherwise). Note that E[Xi] = so IE[X]i<fc -^i] = 
kn^^r- Since k^^t = 'l2i<k ^i/^i event |«^,t- — k^,t| > e 
is the same as I Xi— E[2]];<j, Xi] I > ek. By Theorem 5.1, 

the probability of this event, by choice of fc, is at most 
2exp(-2e^fcVfc) < -J. □ 

A direct corollary of this theorem gives bounds for triangle 
estimates. This is obtained by multiplying event inequality 
in Theorem 5.2 by |Wi/.|/x(V'i ''') s-nd observing that \Tt\ = 

K^,r\W4,\/x{i',T). 

Corollary 5.3. Fix types i/; and r such that xii', t) ^ 0. 
Denote T = 1i^^r\W^\/x{'4','r)- 

Pr[\f-\Tr\\>e\W^\/x{'^,T)]<5 

There are a few subtleties here worth mentioning. For the 
same number of samples, we can use different wedge types 
to count the same triangle set \Tt\. For a candidate type 
ip, the error is proportional to | VK^ |/x(^/', t). Hence, using 
wedge types that are less frequent give stronger approxima- 
tions for the same triangle type. Another consequence of 
this observation is that only 4 wedge types (e.g., in, recip- 
out, recip-in, and recip-tot) are sufficient to compute the 
numbers of all triangles types and thus the 15 closure rates. 

5.1 Uniform sampling of wedge types 

To give a full algorithm, we need to give a procedure that 
samples uniform random wedges of any desired type. We 
can split wedge types into two groups: homogenous and 
heterogenous. Homogenous wedges have only one kind of 
edge, such as in, out, and recip-tot-wedges. Heterogenous 
wedges have different kinds of wedges, such as mid, recip-in, 
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Figure 12: Speed-up over enumeration: We count the total 
number of directed triangles through wedge sampling and 
compare it to enumeration methods. We runs our algorithm 
for 5K, lOK, and 20K samples. 



Figure 13: Improved accuracy with more wedge samples: 
We focus on estimating the fraction that a out- wedge closed 
to a out-recip-triangle (Kiii,c). We consider 5K, lOK, 20K 
wedge samples. The errors are all in third decimal point. 



and recip-out-wedges. The sampling procedures are analo- 
gous for types within a group. Hence, we will only describe 
how to sample uniform random in-wedges and random mid- 
wedges. 

First, we deal with in-wedges. Set p„ = {fi^ )/|H^iii|, 
where v £ V. Note that X^ugyPf ~ Ij so this forms a 
probability distribution over V . 

• Sample a random v according to the distribution given 
by {pv}- 

• Sample a uniform random pair u, w of in-neighbors of 

V. 

• Output the wedge {{u,v), 

This generates a uniform random in-wedge. The number of 
in-wedges incident to v is exactly pt,|VKiii|, and the second 
step generates a uniform random in-wedge centered at v. 

Now for out-wedges. Set = dZ' /\Wt.t\, where v £V . 

Again, T,vevP^ = 1- 

• Sample a random v according to the distribution given 
by {pv}- 

• Sample u, a uniform random in-neighbor of v, and w, a 
uniform random out-neighbor. 

• Output the wedge {{u,v), {v,w)} 

We can show that is a uniform random out-wedge, using an 
argument almost identical to that used above. 

With these procedures, we can implement the wedge sam- 
pling algorithms for all wedge/triangle types. 

5.2 Experimental Results 

We implemented our algorithms in C and ran our experi- 
ments on a computer equipped with a 2.3GHz Intel core 17 
processor with 4 cores and 256KB L2 cache (per core), 8MB 
L3 cache, and 8GB memory. We performed our experiments 
on 11 graphs, whose properties are presented in Table 3. 

In Figure 12, we compare the runtime of wedge sampling 
to the best enumeration algorithm. Our enumeration algo- 
rithm is based on the principles of [4, 19, 6, 24], such that 



each edge is assigned to its vertex with a smaller total de- 
gree, d„, (using the vertex numbering as a tie-breaker), and 
then vertices only check closure for wedges formed by edges 
assigned to them. Once a triangle is identified, it is classified 
according to its edges. As seen in Figure 12, wedge sampling 
works orders of magnitude faster than the enumeration al- 
gorithm. The timing results show tremendous savings; for 
instance, wedge sampling only takes 0.064 seconds on web- 
BerkStan while full enumeration takes 271 seconds. 

Figure 13 shows the accuracy of the wedge sampling algo- 
rithm, by displaying the sampling error in computing how 
often a out-wedge closes to a out-recip-triangle. At 99.9% 
confidence {5 = 0.001), the upper bound on the error we 
expect for 5K, lOK, and 20K samples is .028, .020, and .013, 
respectively. In all our experiments, the observed error is al- 
ways much smaller than what is indicated by Theorem 5.1. 
For instance, the maximum error for 5K samples is .0085, 
much less than that 0.028 given by the upper bound. 

Due to space limitations, we cannot present results accura- 
cies for other closer rates. However, we can report that the 
proposed wedge sampling algorithm produced consistently 
more accurate results than what is indicated by Theorem 5.1 
for all graphs and all closure rates. 

6. CONCLUSIONS 

We initiate the study of directed triangles in massive net- 
works, by defining the set of directed closure measures. These 
quantities reveal a surprising amount of information about 
digraphs. He observe heterogeneity in closure rates of difi'er- 
ent wedges, strong effect of reciprocity in closure rates, the 
power of transitivity in the structure of triangles. Our re- 
sults also show that these observations cannot be explained 
merely by randomness or reciprocity. We hope that this pa- 
per leads the way in deeper studies into digraphs, and also 
convinces the data mining and social networks community 
that direction cannot be ignored. The fast estimation re- 
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suits show that the measures can be computed in a scalable 
manner. 
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